Evaluation of GALS Methods in Scaled CMOS Technology: Moonrake Chip Experience

نویسندگان

  • Milos Krstic
  • Xin Fan
  • Eckhard Grass
  • Luca Benini
  • Mohammad Reza Kakoee
  • Christoph Heer
  • Birgit Sanders
  • Alessandro Strano
  • Davide Bertozzi
چکیده

In this paper the authors present the concept and evaluation results of a complex GALS ASIC demonstrator in 40 nm CMOS process. This chip, named Moonrake, compares synchronous and GALS synchronization technology in a homogeneous experimental setting: same baseline designs, same manufacturing process, same die. The chip validates GALS technology for both point-to-point and network-centric on-chip communications, demonstrating its potentials for different applications. The design analysis, measurement and test results confirm the potential of GALS approach for the scaled technologies, showing the significant benefits in respect to area, power, and EMI when it comes to the complex system implementation. Furthermore, 91% of the tests performed on the GALS network-on-chip test structures completed successfully, validating the timing robustness of new area and latency-efficient synchronization schemes and proving that the design flow for GALS synchronization technology can be implemented by means of mainstream industrial tools. DOI: 10.4018/jertcs.2012100101 2 International Journal of Embedded and Real-Time Communication Systems, 3(4), 1-18, October-December 2012 Copyright © 2012, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. INTRODUCTION Globally Asynchronous Locally Synchronous (GALS) technology has been proposed many years ago as an alternative to the traditional synchronous paradigm for chip synchronization (Krstic, 2006). Although significant potential was reported by the academia, the GALS methodology has never taken off in the industry. However, the growing challenges, imposed by the unrelenting pace of technology scaling to the nanoscale regime, urge for an efficient and safe system-level integration methodology. Consequently, we have targeted the implementation of a chip, named Moonrake, in the advanced 40 nm CMOS process, aiming at the assessment of GALS technology for nanoscale designs. Our intention was to evaluate GALS vs. standard synchronous technology on the same die, by implementing synchronous and GALS counterparts of the same baseline designs, both in the point-to-point as well as in the networkon-chip (NoC) scenarios for on-chip communication. The two scenarios are very different, hence motivating the different choice of baseline designs for their analysis. In point-to-point communication, once an optimized GALS interface is selected, the focus is on the implications of redesigning an entire system around these links. In this direction, we took a state-of-theart multi-million gate synchronous system, an OFDM baseband transmitter developed for a 60 GHz transceiver with a gigabit throughput as presented by Krstic in 2008, and re-implemented it with GALS methodology, using the optimized interfaces for pausible (stoppable) clocking as defined by Fan in 2009. One major goal was to explore Electromagnetic Interference (EMI) and switching noise properties of GALS designs and special algorithms and circuits for noise reduction based on the GALS methodology, initially analyzed by Fan in 2010. Within the chip, the switching noise (and correspondingly EMI) is caused by simultaneous switching activity of the digital circuits and it can lead to various problems including ground bounce, power integrity, IR drop, substrate noise etc. For on-chip networking applications, the communication landscape is more heterogeneous since it results from the interconnection of domains with different synchronization assumptions. Therefore, our focus was on the provision of flexible and cost-effective interfaces for arbitrary composability. In this direction, the novel synchronization interfaces presented by Strano (2010) and Ludovici (2010), aiming at low-area/power/latency overhead while preserving timing robustness, were integrated into NoC test structures exposing (and comparing) a range of flexible GALS solutions. The contributions of this paper are as follows: • The GALS partitioning criteria for a stateof-the-art OFDM transmitter is presented, highlighting the optimized asynchronous link crossing scheme and the partitioning granularity and strategy at the system level. • The design flow followed for different GALS systems is illustrated: from pausible clocking to mesochronous synchronization to mixed-timing systems. Compatibility with mainstream standard cell libraries and design toolflows is discussed. • The feasibility of GALS NoCs linking sub-systems with heterogeneous timing assumptions by means of area/power/latency optimized interfaces while preserving timing margins has been demonstrated. • Synchronous and GALS counterparts of the same baseline designs (the OFDM transmitter and a NoC sub-set), implemented in the same demonstrator chip, have been compared in terms of area, pointing out counterintuitive benefits of the GALS design style. • Finally, the test and measurement results of Moonrake chip are presented and analyzed, with the focus on EMI and power measurements showing the benefits of GALS for complex system integration. Additionally, NoC test structures getting the clock from the external world provided an excellent result: frequencies from 25 to 16 more pages are available in the full version of this document, which may be purchased using the "Add to Cart" button on the product's webpage: www.igi-global.com/article/evaluation-gals-methods-scaledcmos/74341?camid=4v1 This title is available in InfoSci-Journals, InfoSci-Journal Disciplines Communications and Social Science. Recommend this product to your librarian: www.igi-global.com/e-resources/libraryrecommendation/?id=2

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design Flow for a 3-Million Transistor GALS Test Chip

We present a design methodology that has been employed on a GALS test chip with three million transistors using a 0 25μm CMOS technology. The chip contains 25 GALS modules interconnected using four different bus architectures, various additional test stuctures, and occupies a total area of 25mm2. Hierarchical composition and timing verification, in conjunction with a small library of self-timed...

متن کامل

High Performance and Energy Efficient Multi-core Systems for DSP Applications

This dissertation investigates the architecture design, physical implementation, result evaluation, and feature analysis of a multi-core processor for DSP applications. The system is composed of a 2-D array of simple single-issue programmable processors interconnected by a reconfigurable mesh network, and processors operate completely asynchronously with respect to each other in a Globally Asyn...

متن کامل

New Method for Analysis of image sensor to produce and evaluate the image

In this paper, a new method for evaluating CMOS image sensors based on computer modeling and analysis is introduced. Image sensors are composed of different parts, each of which has a specific effect on image quality. Circuits of image sensors can be evaluated and analyzed using circuit simulators or theoretically, but these methods cannot help to produce the final image. In order to produce th...

متن کامل

Dependency Coefficient in Computerized GALS Examination Utilizing Motion Analysis Techniques

Objectives: The GALS (Gait, Arms, Legs and Spine) examination is a compact version of standard procedures used by rheumatologists to determine musculoskeletal disorders in patients. Computerization of such a clinical procedure is necessary to ensure an objective evaluation. This article presents the first steps in such an approach by outlining a procedure to use motion analysis techniques as a ...

متن کامل

Advances in asynchronous logic: from principles to GALS & NoC, recent industry applications, and commercial CAD tools

The growing variability and complexity of advanced CMOS technologies makes the physical design of clocked logic in large Systems-on-Chip more and more challenging. Asynchronous logic has been studied for many years and become an attractive solution for a broad range of applications, from massively parallel multi-media systems to systems with ultra-low power & low-noise constraints, like cryptog...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IJERTCS

دوره 3  شماره 

صفحات  -

تاریخ انتشار 2012